Distribution of polymorphic variants of CYP2A6 and their involvement in nicotine addiction

Tobacco consumption has become a major public health issue, which has motivated studies to identify and understand the biological processes involved in the smoking behavior for prevention and smoking cessation treatments. CYP2A6 has been identified as the main gene that codifies the enzyme that metabolizes nicotine. Many alleles have been identified after the discovery of CYP2A6, suggesting a wide interethnic variability and a diverse smoking behavior of the allele carrying individuals. The main purpose of this review is to update and highlight the effects of the CYP2A6 gene variability related to tobacco consumption reported from diverse human populations. The review further aims to consider CYP2A6 in future studies as a possible genetic marker for the prevention and treatment of nicotine addiction. Therefore, we analyzed several population studies and their importance at addressing and characterizing a population using specific parameters. Our efforts may contribute to a personalized system for detecting, preventing and treating populations at a higher risk of smoking to avoid diseases related to tobacco consumption.


INTRODUCTION
Tobacco consumption has become an epidemic affecting more than 1,000 million people worldwide and is considered the main cause of avoidable death causing approximately 6 million premature deaths each year (WHO, 2011). Thus, tobacco consumption is a public health issue combined with economic losses; it has motivated studies for identifying and understanding the biological processes involved in the smoking behavior for prevention and smoking cessation treatments (Bierut et al, 2014).
Among cigarette compounds, nicotine is responsible for causing dependence by stimulating the smoker and allowing the other compounds to access the body, causing chronic harmful effects and tobacco-related diseases.
Tobacco consumption is caused by environmental, psychosocial and genetic factors (Bierut, et al., 2014). Previous studies have identified genes encoding proteins that influence nicotine addictive behavior due to their effect on the cerebral neurotransmission pathways (Al Koudsi and Tyndale, 2005;Arinami et al., 2000;Verde Rello and Santiago Dorrego, 2013). Moreover, some gene prod-ucts are involved in nicotine response as receptors and metabolizers (Hukkanen et al., 2005;Verde Rello and Santiago Dorrego, 2013). Pérez-Rubio et al. (2015) have reviewed this subject in detail. In the metabolizers group, the CYP2A6 gene plays an important role. Its protein product (by the same name) is the principal enzyme responsible for nicotine metabolism to cotinine and other sub-products in the human body. However, more than 45 alleles have been discovered suggesting wide interindividual and interethnic variety.
The main purpose of this review is to update and highlight the effects of the genetic polymorphisms of CYP2A6 related to tobacco consumption reported in diverse human populations. Moreover, we aimed to consider these polymorphisms in future studies as possible genetic markers for the prevention and treatment of nicotine addiction.

BIOLOGICAL FUNCTION OF CYP2A6
IN SMOKING The CYP2A6 enzyme belongs to the enzyme superfamily known as the cytochrome P450 system (CYP450), also classified in the drug metabolizing enzymes group. These enzymes are found in the endoplasmic reticulum of the cells of some tissues in the body, particularly in the liver. Moreover, they are phase I enzymes responsible for metabolizing more than 80 % of drugs such as xenobiotics and endogen products in the body (Evans and Relling, 2004;Ingelman-Sundberg, 2004).
CYP2A6 has demonstrated to be involved in the metabolism of some endogen and exogen substrates such as precarcinogenic and carcinogenic compounds, as well as some toxins and drugs including nicotine.
CYP2A6 has been particularly important in tobacco consumption because of its involvement in nicotine metabolism. Nicotine is the main compound in tobacco responsible for the development of cigarette addiction by stimulating nicotinic cholinergic receptors (nAChR) that release neurotransmitters in the brain and cause a pleasant sensation in the smoker. The nicotine availability in the body is mediated by biological factors, mainly those related to its metabolism. Smokers tend to consume the same amount of nicotine each day to acquire the desired effects by modulating their smoking behavior to adjust nicotine availability for the purpose of regulating nicotine levels in the body (Benowitz, 1992).
It has been reported that CYP2A6 is the main enzyme involved in the nicotine oxidation to cotinine. CYP2A6 catalyzes approximately 80 % (55-92 %) of this reaction via Coxidation in addition to other metabolic pathways for nicotine and its metabolites (Benowitz and Jacob III, 1994;Messina et al., 1997;Nakajima et al., 1996). Some other enzymes of CYP450 contribute to a lesser degree to nicotine metabolism such as CYP2B6, CYP2A13, CYP2D6 and CYP2E1 (mentioned in the order of relevance).
These biological products interact directly with nicotine to affect physiological brain processes (e.g., nAChR) and are inactivated and removed from the body (e.g., CYP450 enzymes) making their genes ideal candidates for altering smoking behavior (Malaiyandi et al., 2005).

CYP2A6 GENETIC VARIANTS
The CYP2A6 gene has a 6 kb extension length, and it is composed of 9 exons, which encode for a 494 amino-acid product (Fernandez-Salguero et al., 1995). It is located in the chromosomal band 19q13.2, where other CYP450 gene subfamilies (CYP2B, CYP2F, CYP2G, CYP2S, and CYP2T) are also present. The CYP2A subfamily cluster includes the CYP2A6, CYP2A7, and CYP2A13 genes and other pseudogenes (Hoffman et al., 2001).
The Human CYP-Allele Nomenclature Database (HCAND) (http://www.cypalleles.ki.se) was created in 1999 for the identification and characterization of CYP2A6 alleles (and other CYP450 genes). This database consists of a committee for unifying and assigning nomenclature for the already discovered alleles and alleles to be discovered in the future Ingelman-Sundberg, 2010, 2013).
To date, 42 well-characterized alleles and some haplotypes that are uncharacterized ("CYP2A6 allele nomenclature," 2014) have been identified. These alleles are determined according to the origins of their mutation(s), such as gene conversion, gene deletion, gene duplication and/or single nucleotide polymorphism (SNP). The gene mutations are summarized in Figure 1.
CYP2A6*2 consists of a missense mutation of 1799T>A, causing an amino acid change of Leu160His in the enzyme. Thus, the protein does not incorporate the heme group, inactivating the enzyme for in vitro and in vivo assays (Benowitz et al., 1995;Hadidi et al., 1997;Oscarson, et al., 1999b;Yamano et al., 1990).
CYP2A6*5 contains a missense mutation, 6582G>T, creating a Gly479Val amino acid change and resulting in a lack of enzyme function (Oscarson et al., 1999b).
CYP2A6*6 contains a missense mutation, 6582G>T, which creates an Arg128Gln amino acid change causing lower enzymatic activity of the enzyme (Kitagawa et al., 2001). CYP2A6*7 contains a missense mutation, 6558G>A, which produces an Ile471Thr amino acid change that decreases the enzymatic activity to metabolize some substrates for in vivo and in vitro assays Uno et al., 2013;Xu et al., 2002).
CYP2A6*8 contains a missense mutation, 6600G>T, creating an amino acid change in Arg485Leu; however, the mutation's effect apparently does not change the enzymatic activity (Xu et al., 2002).
CYP2A6*9 contains a point mutation, -48T>G, on the TATA box located in the gene promoter, which results in a decrease of more than 50 % of the enzymatic activity for in vitro and in vivo assays (Pitarque et al., 2001;Yoshida et al., 2003). There have been identified two subtypes of this allele: CYP2A6*9A, which contains an additional -1013A>G point mutation , and CYP2A6*9B, which contains the point mutations -1680A>G, -1301A>C, -1289G>A, 1620T>C, 1836G>T, 6354T>C and 6692C>G .
CYP2A6*10 contains two point mutations, similar to the CYP2A6*7 and CYP2A6*8 alleles. These point mutations decrease the enzymatic activity dramatically and make it completely inactive for some substrates (Xu et al., 2002).
CYP2A6*11 contains a missense mutation, 3391T>C, which results in the amino acid change of Ser224Pro, decreasing the enzymatic activity (Daigo et al., 2002). CYP2A6*12A originated by the unequal crossover between CYP2A6 and CYP2A7, which resulted in a hybrid allele at the 5' UTR and exons 1-2 belonging to CYP2A7 and from the 3 rd to 9 th exon belonging to CYP2A6. This generates a 10 amino acid substitution compared to the reference allele . Later, several SNPs were discovered in the same allele, which generates two subvariants (CYP2A6*12B-C) . These alleles are classified as null enzymatic activity (Bloom et al., 2011).
CYP2A6*13 has two point mutations: the first at -48T>G in the TATA box of the promoter and the second at 13G>A changes the amino acid Gly5Arg. The enzymatic activity is decreased .
CYP2A6*14 has the missense mutation 86G>A and changes the Ser29Asn amino acid chain; however, this does not affect the enzymatic activity . CYP2A6*15 is the product of two point mutations: the first at -48T>G in the TATA box of the promoter and the second at 2134 A>G, which results in an amino acid change of Lys194Glu . However, this enzyme does not show differences in substrate metabolism (Tiong et al., 2014;.
CYP2A6*16 has a missense mutation at 2161C>A, which makes an amino acid change of Arg203Ser ; however, it does not cause a defect in the enzymatic activity Tiong et al., 2014Tiong et al., , 2010.
CYP2A6*17 shows several point mutations, 51G>A, 209C>T, 1779G>A, 4489C>T, 5065G>A, 5163G>A, 5717C>T and 5825A>G, which result in the amino acid change Val365Met and cause a decrease in the enzymatic activity of the allele .
CYP2A6*19 is produced by the point mutations at 5668A>T, 6354T>C, and 6558T>C and a gene conversion at the 3'UTR with CYP2A7, which correspond to the amino acid changes of Tyr392Phe and Ile471Thr, decreasing the enzymatic activity .
CYP2A6*20 has a frameshift mutation at nucleotides 2140 and 2141, which displaces the frameshift from the codon 196 to stop prematurely at 220 codons. In addition, it has three point mutations at 51G>A, 5684T>C and 6692C>G. These mutations produce a loss-of-function enzyme Mwenifumbo et al., 2008).
CYP2A6*21 is the result of two point mutations: 51G>A and 6573A>G, which produce an amino acid change Lys476Arg . However, the functional effect of the enzyme is still under discussion because it has been reported that in vivo assays show differences according to the study population (Al Koudsi et al., 2006;Mwenifumbo et al., 2008) and in vitro assays show normal enzymatic activity (Tiong et al., 2014(Tiong et al., , 2010.
CYP2A6*22 is the result of three point mutations: 51G>A, 1794C>G and 1798C>A, which cause the amino acid changes Asp158Glu and Leu160Ile . These mutations reduce the enzyme affinity to the substrates; CYP2A6*22 has 39 % of the enzyme activity compared to the reference allele (Tiong et al., 2014(Tiong et al., , 2010.
CYP2A6*23 contains a point mutation at 2161C>T that corresponds to the amino acid change Arg203Cys; this decreases the enzymatic activity to 19 % compared to the reference allele .
CYP2A6*38 is a result of the missense mutation 5023A>G, which produces the amino acid change of Tyr351His (Bloom et al., 2011). An in silico assay classified the SNP as harmful, suggesting a decreased enzymatic activity (Bloom et al., 2011). CYP2A6*39 was described by Pilinguian et al. (2014) as a missense mutation of 468G>A; however, the HCAND ("CYP2A6 allele nomenclature," 2014) adds the point mutations 171C>A, 1779G>A and 5717C>T, which cause the amino acid change Val68Met. The enzymatic activity of this allele is reported as decreased to half of the reference allele (Piliguian et al., 2014).
CYP2A6*41 contains the missense mutation 3515G>A according to Pilinguian et al. (2014), but the HCAND ("CYP2A6 allele nomenclature," 2014) added the point mutations 51G>A and 507C>T, which changed the amino acid to Arg265Gln. This allele was shown to have a minimal alteration in expression and a normal enzymatic activity (Piliguian et al., 2014).
CYP2A6*42 is the result of a missense mutation 3524T>C according to Pilinguian et al. (2014), but the HCAND ("CYP2A6 allele nomenclature," 2014) added the mutations 51G>A and 5684T>C, which made the amino acid change Ile268Thr that decreases the expression and enzymatic activity on in vivo and in vitro assays (Piliguian et al., 2014).
CYP2A6*43 has the missense mutation 4406C>T, which makes the amino acid change to Thr303Ile and shows decreased expression and enzymatic activity in in vivo and in vitro assays (Piliguian et al., 2014).
CYP2A6*44 carries the missense mutation 5661G>A according to Pilinguian et al. (2014), but later the HCAND ("CYP2A6 allele nomenclature," 2014) added the mutations 51G>A, 5738C>T, 5745A>G and 5750G>C, which modify the amino acids to Glu390Lys, Asn418Asp and Glu419Asp. These mutations have been shown to reduce the enzymatic activity and the reference allele expression to one-third (Piliguian et al., 2014).
CYP2A6*45 has a missense mutation at 6531T>C according to Pilinguian et al. (2014), but later the HCAND ("CYP2A6 allele nomenclature," 2014) added the point mutations 51G>A and 4464G>A, which change the amino acid Leu462Pro. The mutations have been proven (as in CYP2A6*44) to reduce the enzymatic activity and the reference allele expression to one-third (Piliguian et al., 2014).
There are two CYP2A6 gene duplications: CYP2A6*1X2A originated by an unequal crossover from the 8 th to 9 th exon with CYP2A6*4D as a reciprocal product (Rao et al., 2000). CYP2A6*1X2B also originated by an unequal crossover of CYP2A7 from 5.2 to 5.6 kb downstream of the stop codon with CYP2A6*4B as the reciprocal product (Fukami et al., 2007). Its enzyme activity has been shown to increase in in vivo assays (Rao et al., 2000).
CYP2A6 has been suggested as a highly polymorphic gene because it is located in a small chromosomal region that contains several genes and some unequal crossover events, point mutations and genetic conversions between CYP2A6 and CYP2A7 (Hoffman et al., 1995). These facts, plus evolutionary forces such as genetic drift and natural selection, may have resulted in this genetic variability, which spread among human populations (Ingelman-Sundberg, 2004. This genetic variability, similar to other CYP450 genes, could explain the metabolic response to exogenous compounds such as nicotine and other drugs that ranges from 20-40 % (Ingelman-Sundberg, 2001).
CYP2A6 genotypes can be classified according to their alleles and their enzymatic activity, which is referred to as the metabolism range of 3-hydroxycotinine/cotinine (Dempsey et al., 2004), as mentioned below: -Ultrarapid metabolizers. Individuals who have an enzymatic activity >100 % of normal; they contain more than two functional alleles (the CYP2A6*1X2 allele).
-Normal metabolizers. Individuals who have an enzymatic activity of 100 % (normal); they contain two functional alleles.
-Intermediate metabolizers. Individuals who have an enzymatic activity ≤75 % of normal may contain a functional and a defective allele or even two partially defective alleles.
-Slow metabolizers. Individuals who have an enzymatic activity ≤50 % of normal can contain a functional and a loss-of-function allele or even two loss-of-function alleles.

EFFECTS OF CYP2A6 GENETIC VARI-ANTS ON TOBACCO CONSUMPTION
The genetic variability of CYP2A6 directly influences the range of nicotine metabolism in the body, which can indirectly affect the reinforcing and aversive nicotine properties in the brain and can change the individual risk of nicotine dependence. To prove the effect of CYP2A6 variants on tobacco consumption, several studies were conducted that included family, twin and non-related subject designs. These studies associate an allele with the amount of metabolized nicotine or another variable related to tobacco consumption.
Smokers who carry some CYP2A6 alleles show a different smoking behavior compared to carrying the wild-type allele, suggesting that smokers regulate their smoking behavior to obtain the desired nicotine levels in their body (Malaiyandi et al., 2005;Rao et al., 2000;Schoedel et al., 2004).
On the other hand, some reports do not prove the association between null and decreased CYP2A6 alleles related to tobacco consumption (London et al., 1999;Loriot et al., 2001;Sabol and Hamer, 1999;Schulz et al., 2001;Tiihonen et al., 2000;Zhang et al., 2001). This lack of association may occur because of several factors such as designing the population stratification (comparing populations where there are substructures between cases and controls) and population ethnicity, lack of detailed phenotypic evaluation, indeterminate comorbidities, different genotyping methods, examination of different allelic variants, inconsistency in smoking history and differences in symptoms of nicotine dependence among smokers (Lerman and Niaura, 2002;O'Loughlin et al., 2004).
Detecting the alleles of CYP2A6 can allow us to characterize different smoking behaviors and smoking-related diseases among individuals (Fujieda et al., 2004), due to their role in nicotine metabolism and the metabolism of certain carcinogenic compounds. This could have an implication on public health by reducing the harmful effects related to smoking and developing a personalized smoking cessation according to their individual genotype (Liu et al., 2011;Schoedel et al., 2004). However, these alleles have a specific distribution in worldwide populations.

POPULATION DISTRIBUTION OF CYP2A6 ALLELES
The CYP2A6 allele distribution has an interethnic pattern. Knowing the individual differences in nicotine metabolism may allow us to answer the following questions: Why do some people become regular smokers after in-itial exposure, while others experience negative reactions and discontinue use? Why do some people smoke in greater quantities than others? Why do different individuals not respond the same way to drug therapies to quit smoking? Why do some individuals develop smoking related diseases faster than others? Schoedel et al., 2004;Swan et al, 1997Swan et al, , 2005.
Therefore, we present the allele frequencies in reported populations where tobacco consumption, cancer and nicotine metabolism were studied and in cohorts and general population studies, which involve CYP2A6. The number of reports of each allele according to a population group is summarized in Figure 2. For more practical reasons, we only showed the frequency without distinction for alleles with subvariants, except for the wild-type allele CYP2A6*1 whose frequency was not completely calculated because some studies assign an unidentified genotype to the reference allele.
CYP2A6*22 has been reported at frequency less than 0.3 %  in Caucasian populations, but not in Japanese or Korean  populations.
CYP2A6*23 has a frequency range among 1-2 % Mwenifumbo et al., 2008) in African populations, but was not reported in Caucasian, Japanese or Chinese populations .
CYP2A6*35 has been reported at a frequency between 2.5-2.9 % (Al Koudsi et al., 2010;Ho et al., 2009) in African populations living in North American countries. In East Asian populations, such as Chinese, Japanese and Taiwanese, it has been found at a frequency of 0.5-0.8 % (Al Koudsi et al., 2010), but it has not been found in Caucasian or Alaska Yupik (Al Koudsi et al., 2010;Binnington et al., 2012) populations.
CYP2A6*36 and *37 has been reported in Taiwanese populations at a frequency of 0.3 %, but not in African, Caucasian, Chinese nor Japanese (Al Koudsi et al., 2010) populations.
Most studies address the majority of a country's population as a general population. However, a few studies focus on more specific population classifications as ethnic and regional groups. In the Asian population, it was studied among Tottori, Shimane, Ehime and Fukuoka people of the respective districts of Yonago, Izumo, Matsuyama and Kurume located in Japan (Takeshita et al., 2006). In China, it has been compared in the prevailing Han Chinese group and the Uighur, Bouyei and Tibetan ethnic groups (Pang et al., 2015). In the South of India, the frequencies of the people from Andhra Pradesh, Karnataka, Kerala and Tamil Nadu regions (Krishnakumar et al., 2012) were compared. In Iran, some eth-nic groups such as Turkomans from the Golestan Province, Turks from the Ardabil Province and Zoroastrian Persians from Tehran (Sepehr et al., 2004) were studied. In Russia, the Tatar ethnic group was reported (Korytina et al., 2014). In African populations, the ethnic group Ovambo from Namibia was reported (Takeshita et al., 2006); and some ethnic groups such as Akan, Guan, Ewe, Ga, Nzima and Dargarti, but there were a few participants that were added in a single Ghanaian population (Gyamfi et al., 2005). In Oceania, the Māori ethnic group from New Zealand is the only population reported across the continent (Lea et al., 2008). In the American continent, the ethnic groups such as Yupik from Alaska and Canadian natives have been reported Nowak et al., 1998;Schoedel et al., 2004).

CONCLUSIONS AND PERSPECTIVES
The enzyme responsible for metabolizing nicotine is mostly encoded by the CYP2A6 gene, which is highly polymorphic. This variability is due to changes in DNA, which have generated different responses to nicotine and are reflected in the individual smoking behaviors along with other factors. It has been reported that this variability has been generated and distributed over a long time in different human populations, showing well defined ethnic patterns. Although association studies between carriers of certain variants and different smoking behaviors have been numerous with plausible results, population studies reporting frequencies of these variants are few. General population studies exhibit most reliable information than association studies with a variable frequency, because the population requirements are usually more specific, creating a population bias. However, it would be advisable to address this type of methodology in higher risk populations of smoking and those where policies to control smoking are less efficient, and where more smoking-related diseases are reported in the population. Additionally, the population must be characterized by more specific requirements, such as including the ancestry informative markers and avoiding "self-reporting" as the unique classification criteria. While there is a Human CYP-Allele Nomenclature Database in which the genetic findings of CYP2A6 are unified, it is necessary to supplement it with updated data as per the population distribution. All of this could contribute to a personalized system that could detect, prevent and treat populations at risk of smoking, and in consequence, avoid tobacco consumption related diseases.